数据分析/运营 | 您所在的位置:网站首页 › Hive 累计每天的余额 › 数据分析/运营 |
文章目录
前言一、题目描述二、解题思路
前言
本文介绍一个经典的面试题:如何求用户的最大连续登陆天数。 一、题目描述有一个用户登陆表,表中有两列,一列是userid,代表用户的id;另一列是sigindate,代表用户的登陆日期。现在要求每个用户的最大连续登陆天数。 二、解题思路对于每一个用户,首先求出它的日期排名ranking, 然后再求出今天距离 登陆日期的日期天数 dates ,拿这个天数 减去 - 它的排名, 得到一个辅助列 diff,然后求出diff的最大计数,即为用户的最大连续登陆天数。是不是看的很迷糊?最大计数?没事,下面对这个思路进行详细的证明。 思路证明:现在有三个相邻(不一定连续)的日期d1——>d2——>d3,那么肯定有 ranking_d2 - rankingd1 = 1 (1) ranking_d3 - rankingd2 = 1 (2)假设d1和d2是连续的, 即 dates_d2 - date2_d1 = 1 (3)而d2和d3是不连续的(不妨假设相隔了2天),即 dates_d3 - date2_d2 = 2 (4)那么(1)-(3)有: (ranking_d2 - dates_d2) - (ranking_d1 - dates_d1)= 0即 ranking_d2 - dates_d2 = ranking_d1 - dates_d1,即diff1 = diff2(2)-(4)有: (ranking_d3 - dates_d3) - (ranking_d2 - dates_d2)= -1 ≠0即 ranking_d2 - dates_d2 ≠ ranking_d1 - dates_d1,即 diff1 ≠ diff2这就说明,如果两个日期d1和d2是连续的,那么它们的diff相同,如果不连续,那么它们的diff不同! 因此可以通过计数count(diff)来判断每一段连续的登陆天数持续了多久(这主要是因为存在用户a,一段时间连续登陆了2天,然后中间断了,后面又连续登陆了3天的情况), 然后再求每个计数的最大值,即为最大连续登陆天数,也就是最大计数。 下面以一个列子来说明这种算法,用到的数据表如下所示: 解题步骤: (1)求排名ranking,间隔天数dates,以及排名-天数得到diff select *, datediff(now(), sigindate)-1267 as dates, row_number() over(partition by userid order by sigindate desc) as ranking, (datediff(now(), sigindate)-1267 - row_number() over(partition by userid order by sigindate desc) ) as diff from t;可以看到,对于每一个用户而言,如果它的登陆日期sigindate是连续的,那么diff就会相同(具体是多少不重要)。 (2)对diff进行计数,即求出用户的所有连续活跃天数 select userid,diff, count(diff) as 'diff计数' from ( select *, datediff(now(), sigindate)-1267 as dates, row_number() over(partition by userid order by sigindate desc) as ranking, (datediff(now(), sigindate)-1267 - row_number() over(partition by userid order by sigindate desc) ) as diff from t ) as t1 group by userid, diff;(3)求diff计数的最大值,即求用户的最大活跃天数 select userid, max(diff计数) as '最大活跃天数' from ( select userid,diff, count(diff) as 'diff计数' from ( select *, datediff(now(), sigindate)-1267 as dates, row_number() over(partition by userid order by sigindate desc) as ranking, (datediff(now(), sigindate)-1267 - row_number() over(partition by userid order by sigindate desc) ) as diff from t ) as t1 group by userid, diff ) as t2 group by userid;整个过程可以简化为: select userid,diff, count(diff) as 'diff计数' from ( select userid,diff, count(diff) as 'diff计数' from ( select *, (datediff(now(), sigindate)-1267 - row_number() over(partition by userid order by sigindate desc) ) as diff from t ) as t1 group by userid, diff ) as t2 group by userid; |
CopyRight 2018-2019 实验室设备网 版权所有 |